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Abstract — In this paper, we approach the classical problem of 
clustering using solution concepts from cooperative game theory 
such as Nucleolus and Shapley value. We formulate the problem 
of clustering as a characteristic form game and develop a novel 
algorithm DRAC (Density-Restricted Agglomerative Clustering) 
for clustering. With extensive experimentation on standard data 
sets, we compare the performance of DRAC with that of well 
known algorithms. We show an interesting result that four 
prominent solution concepts, Nucleolus, Shapley value, Gately 
point and r-value coincide for the defined characteristic form 
game. This vindicates the choice of the characteristic function of 
the clustering game and also provides strong intuitive foundation 
for our approach. 

Index Terms — Pattern clustering, Characteristic form game, 
Nucleolus, Shapley value. 



I. Introduction 

CLUSTERING or unsupervised classification of patterns 
into groups based on similarity is a very well studied 
problem in pattern recognition, data mining, information re- 
trieval, and related disciplines. Besides, clustering has also 
been used in solving extremely large scale problems. Clus- 
tering also acts as a precursor to many data processing tasks 
including classification. According to Backer and Jain Q, in 
cluster analysis, a group of objects is split into a number 
of more or less homogeneous subgroups on the basis of an 
often subjectively chosen measure of similarity (i.e., chosen 
subjectively based on its ability to create interesting clusters) 
such that the similarity between objects within a subgroup 
is larger than the similarity between objects belonging to 
different subgroups. A key problem in the clustering domain 
is to determine the number of output clusters k. Use of 
cooperative game theory provides a novel way of addressing 
this problem by using a variety of solution concepts. 

In the rest of this section, we justify the use of game 
theoretic solution concepts, specifically Nucleolus, for pattern 
clustering, give an intuition why the various solution concepts 
coincide and refer to a few recent works in clustering using 
game theory. In Section [II] we provide a brief introduction 
to the relevant solution concepts in cooperative game theory. 
Sections [III] explains our model and algorithm for clustering 
based on cooperative game theory. In Section HVl we describe 
the experimental results and provide a comparison of our 
algorithm with some existing related ones. The coincidence of 
Nucleolus, Shapley value, Gately point and r-value with the 
chosen characteristic function is discussed and formally proved 
in Section [V] We conclude with future work in Section |VT] 

We motivate the use of game theory for pattern clustering 
with an overview of a previous approach. SHARPC Qj] pro- 
poses a novel approach to find the cluster centers in order to 
give a good start to K-means, which thus results in the desired 
clustering. The limitation of this approach is that it is restricted 



to K-means, which is not always desirable especially when 
the classes have unequal variances or when they lack convex 
nature. We, therefore, extend this approach to a more general 
clustering problem in M 2 . 

As it will be clear in Section [D] Shapley value is based 
on average fairness, Gately point is based on stability, r- 
value is based on efficiency while Nucleolus is based on 
both min-max fairness and stability. Hence, it is worthwhile 
exploring these solution concepts to harness their properties 
for the clustering game. Of these solution concepts, the 
properties of Nucleolus, viz., fairness and stability, are the 
most suitable for the clustering game. Moreover, we show 
in Section [V] that all these solution concepts coincide for 
the chosen characteristic function. As finding Nucleolus, for 
instance, is computationally expensive, it is to our advantage if 
we use the computational ease of other solution concepts. We 
see in Section [III] that for the chosen characteristic function, 
the Shapley value can be computed in polynomial time. So 
for our algorithm, we use Shapley value, which is equivalent 
to using any or all of these solution concepts. 

The prime reason for the coincidence of the relevant solution 
concepts is that the core, which we will see in Section ITl-AI is 
symmetric about a single point and all these solution concepts 
coincide with that very point. We will discuss this situation in 
detail and prove it formally in Section [V] 

There have been approaches proposing the use of game 
theory for pattern clustering. Garg, Narahari and Murthy 
HI propose the use of Shapley value to give a good start 
to K-means. Gupta and Ranganathan ATI . Ifl2l use a mi- 
croeconomic game theoretic approach for clustering, which 
simultaneously optimizes two objectives, viz. compaction and 
equipartitioning. Bulo and Pelillo ifTUll use the concept of evo- 
lutionary games for hypergraph clustering. Chun and Hokari 
|8 1 prove the coincidence of Nucleolus and Shapley value for 
queueing problems. 

The contributions of our work are as follows: 

• We explore game theoretic solution concepts for the 
clustering problem. 

• We prove coincidence of Nucleolus, Shapley value, 
Gately point and r-value for the defined game. 

• We propose an algorithm, DRAC (Density-Restricted Ag- 
glomerative Clustering), which overcomes the limitations 
of K-means, Agglomerative clustering, DBSCAN lfl3l 
and OPTICS (14\ using game theoretic solution concepts. 

II. Preliminaries 

In this section, we provide a brief insight into the cooper- 
ative game theory concepts H, Q, (8l viz. Core, Nucleolus, 
Shapley value, Gately point and r-value. 

A cooperative game (TV, v) consists of two parameters 
TV and v. N is the set of players and v : 2 N — s- R is 
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the characteristic function. It defines the value v(S) of any 
coalition 5 C N. 



A. The Core 

Let (N, v) be a coalitional game with transferable utility 
(TU). Let x = (xi, . . . , x„), where X, represents the payoff 
of player i, the core consists of all payoff allocations x = 
(xi, ...,x„) that satisfy the following properties. 

1) individual rationality, i.e., Xj > v({i}) V i E N 



2) collective rationality i.e. ^ 



u(N). 



3) coalitional rationality i.e. XieS Xi — V (S) V5 C iV. 
A payoff allocation satisfying individual rationality and col- 
lective rationality is called an imputation. 

B. The Nucleolus 

Nucleolus is an allocation that minimizes the dissatisfaction 
of the players from the allocation they can receive in a 
game 0. For every imputation x, consider the excess defined 
by 



e s (x) = u(S) 



l£S 



es(x) is a measure of unhappiness of 5 with x. The goal 
of Nucleolus is to minimize the most unhappy coalition, 
i.e., largest of the es(x). The linear programming problem 
formulation is as follows 



min Z 



subject to 



z +y^x t 



> 1/(5) V5 C N 



u{N) 



The reader is referred to Q for the detailed properties of 
Nucleolus. It combines a number of fairness criteria with 
stability. It is the imputation which is lexicographically central 
and thus fair and optimum in the min-max sense. 

C. The Shapley Value 

Any imputation <f> = (<fii, </>„) is a Shapley value if it 
follows the axioms which are based on the idea of fairness. 
The reader is referred to J4] for the detailed axioms. For any 
general coalitional game with transferable utility (N, v), the 
Shapley value of player i is given by 

& = ^£(i 5 i - wn- mns) u{s- i)] 

?ren 

LI = set of all permutations on N 

xj = contribution of player i to permutation it 



D. The Gately Point 

Player i's propensity to disrupt the grand coalition is defined 
to be the following ratio H. 



di(x) 



(1) 



x, - u(i) 

If dj(x) is large, player i may lose something by deserting 
the grand coalition, but others will lose a lot more. The 
Gately point of a game is the imputation which minimizes 
the maximum propensity to disrupt. The general way to 
minimize the largest propensity to disrupt is to make all of the 
propensities to disrupt equal. When the game is normalized so 
that v(i) = for all i, the way to set all the dj(x) equal is to 
choose Xi in proportion to v(N) — v(N — i). 

v(N) - v{N - i) 



G Vl 



E. The T-value 



E iG jvM^)-K^-i)) 



u{N) 



T-value is the unique solution concept which is efficient and 
has the minimal right property and the restricted proportion- 
ality property. The reader is referred to (6| for the details of 
these properties. For each i E N, let 

Mi{v) = v(N) - v{N - i) and m^v) = v(i) (2) 

Then the r- value selects the maximal feasible allocation on 
the line connecting M{v) = (Mj(i/))j e jv and m{v) = 
(mi(is))i£N El- For each convex game [N,v), 

7» = AM (i>) + (1 - A)m(i/) (3) 

where A G [0, 1] is chosen so as to satisfy 

- <N - *)) + (1 - X H*)} = v(N) (4) 

ieN 

III. A Model and Algorithm for Clustering based 
on Cooperative Game Theory 

For the clustering game, the characteristic function is chosen 
as in (TJ. 

1 



1/(5) 



2 ^ 



f(d(i,j)) 



(5) 



In Equation [5] d is the Euclidean distance, f : d — > [0, 1] is 
a similarity function. Intuitively, if two points i and j have 
small euclidean distance, then f(d(i,j)) approaches 1. The 
similarity function that we use in our implementation is 

d(i,j) 



f(d(i,j)) = l- 



(6) 



KM 



where djv/ is the maximum of the distances between all pairs 
of points in the dataset. 

When Equation [5] is used as characteristic function, it is 
shown in H] that Shapley value of player i can be computed 
in polynomial time and is given by 



5£/(d(i,j)) 



(7) 
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Also, from Equation [5J it can be derived that 



(8) 



res 

\T\=2 



In Sections U and [II] we have discussed the benefits of 
imputations resulting from various game theoretic solution 
concepts. Also, in Section [V] we will show that all these 
imputations coincide. Moreover, as Equation Q shows the ease 
of computation of Shapley value in the clustering game with 
the chosen characteristic function, we use Shapley value as 
the base solution concept for our algorithm. 

The basic idea behind the algorithm is that we expand the 
clusters based on density. From Equations [6] and [7] Shapley 
value represents density in some sense. For every cluster, we 
start with an unallocated point with the maximum Shapley 
value and assign it as the cluster center. If that point has 
high density around it, it should only consider the close-by 
points, otherwise it should consider more faraway points. We 
implement this idea in step [5] of Algorithm Q] with parameter 
B. For the point with the globally maximum Shapley value, 
B = S, while it is low for other cluster centers. Also, as 
we go from cluster center with the highest Shapley value 
to those with lower values, we do not want to degrade the 
value of B linearly. So we have square-root function in step [5] 
Alternatively, it can be replaced with any other function which 
ensures sub-linear degradation of 8. The input parameters 5 
and 7 should be changed accordingly. 

Algorithm 1 Density-Restricted Agglomerative Clustering 

(DRAC) 

Require: Dataset, maximum threshold for similarity 5 € [0, 1] 
and threshold for Shapley value multiplicity 7 <= [0, 1] 
1: Find the pairwise similarity between all points in dataset. 

2: For each point i, compute the Shapley value using Equa- 
tions [6] and [7] 

3: Arrange the points in non-increasing order of their Shapley 
values. Let c/m be the global maximum of Shapley values. 
Start a new queue, let's call it expansion queue. 

4: Start a new cluster. Of all the unallocated points, choose 
the point with maximum Shapley value as the new cluster 
center. Let Im be its Shapley value. Mark that point as 
allocated. A dd i t to the expansion queue. 

5: Set 8 = 8<fi^. 

' V 9M 

6: For each unallocated point, if the similarity of that point 
to the first point in the expansion queue is at least B, add 
it to the current cluster and mark it as allocated. If the 
Shapley value of that point is at least 7-multiple of Im, 
add it to the expansion queue. 
7: Remove the first point from the expansion queue. 
If the expansion queue is not empty, go to step [6] 
If the cluster center is the only point in its cluster, mark 
it as noise. 

10: If all points are allocated a cluster, terminate. Else go to 
stepg] 

Secondly, when the density around a point is very low as 



compared to the density around the cluster center of the cluster 
of which it is a part of, it should not be responsible for further 
growth of the cluster. This ensures that clusters are not merged 
together when they are connected with a thin bridge of points. 
It also ensures that the density within a cluster does not vary 
beyond a certain limit. We implement this idea with what we 
call an expansion queue. We add points to the queue only if 
their Shapley value is at least 7-multiple of that of the cluster 
center of the cluster of which it is a part of. The expansion 
queue is responsible for the growth of a cluster and it ceases 
once the queue is empty. The detailed and systematic steps 
are given in Algorithm Q] 

IV. Experimental Results 

In this section, we qualitatively compare our algorithm with 
some existing related algorithms. SHARPC |1 j gives a good 
start to K-means using a game theoretic solution concept, viz., 
the Shapley value. As our algorithm hierarchically allocates 
points to the cluster starting from a cluster center, we compare 
it with Agglomerative Clustering. The way our characteristic 
function and similarity function are defined, the Shapley value 
represents density in some sense. So we compare our algo- 
rithm with the density -based ones, viz., DBSCAN (Density- 
Based Spatial Clustering of Applications with Noise) and 
OPTICS (Ordering Points To Identify the Clustering Struc- 
ture). Throughout this section, 'cluster (<colored marker>Y 
refers to the cluster marked by that colored marker in the 
corresponding figure. Noise is represented by (o). 
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Fig. 1. Clusters as discovered by SHARPC 

Figure Q] shows the clusters formed by SHARPC 1 1 !] which 
tries to allocate clusters by enclosing points in equal-sized 
spheres. It cannot detect clusters that are not convex. Also, 
the cluster (x) is a merging of three different clusters. If the 
threshold is increased so as to solve the second problem, more 
clusters are formed and the larger clusters get subdivided into 
several smaller clusters. 

Agglomerative Clustering, as Figure [2] shows, can detect 
clusters of any shape and size. But owing to a constant 
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Agglomerative Clustering 
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Fig. 2. Clusters as discovered by Agglomerative Clustering 



Fig. 4. Clusters as discovered by OPTICS 
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Fig. 3. Clusters as discovered by DBSCAN 



Fig. 5. Clusters as discovered by DRAC 



threshold for the growth of all clusters, it faces the problem 
of forming several clusters in the lower right part when they 
should have been part of one single cluster. If the threshold 
is decreased so as to solve this problem, clusters (*) and (*) 
get merged. Another problem is that the bridge connecting the 
two classes merges these into one single cluster (*). 

Figure [3] shows the results of DBSCAN E). It is well 
known that it cannot detect clusters with different densities 
in general. The points in the lower right part are detected 
as noise when intuitively, the region is dense enough to be 
classified as a cluster. An attempt to do so compromises the 
classification of clusters (*) and (*) as distinct. Moreover, the 
bridge connecting the two classes merges them into one single 
cluster (*). An attempt to do the required classification leads 
to unnecessary subdivision of the rightmost class and more 
points being detected as noise. 

The clustering obtained using OPTICS lfT4l is shown in 



Figure [4] Unlike DBSCAN, clusters (*) and (*) are detected 
as distinct. However, the points in the lower right part are 
detected as noise when they should have been classified as one 
cluster. The reachability plots for different values of minpts are 
such that an attempt to classify some of these points as a part 
of some cluster leads to the merging of clusters (*) and (*). If 
we continue trying to get more of these points allocated, the 
bridge plays the role of merging the two clusters (*) and (*). 

Figure shows the clustering obtained using Density- 
Restricted Agglomerative Clustering (DRAC). As cluster (+) 
is highly dense, its cluster center has very high Shapley value 
resulting in a very high value of /3, the similarity threshold. 
No point in cluster (*) crosses the required similarity threshold 
with the points in cluster (+), thus ensuring that the two 
clusters are not merged. The points in the central part of 
the bridge have extremely low Shapley values as compared 
to the cluster center of cluster (+) and so they fail to cross 
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the Shapley value threshold of having at least 7-multiple of the 
Shapley value of the cluster center. This ensures that they are 
not added to the expansion queue of the cluster, thus avoiding 
the cluster growth to extend to cluster (*). Cluster (*) extends 
to the relatively low density region because of points being 
added to the expansion queue owing to their sufficiently high 
Shapley value, at least 7-multiple of the Shapley value of the 
cluster center. Cluster (x) is a low density cluster owing to 
the low Shapley value of the cluster center and so low value 
of j3, the similarity threshold, thus allowing more faraway 
points to be a part of the cluster. Cluster centers, which fail 
to agglomerate at least one point with their respective values 
of j3, are marked as noise. 

Like other clustering algorithms, Algorithm Q] faces some 
limitations. As it uses Equations [6] and [7] to compute the 
Shapley values, the Shapley value of a point changes even 
when a remote point is altered, which may change its cluster 
allocation. For the same reason, the Shapley values of the 
points close to the mean of the whole dataset is higher than 
other points even when the density around them is not as high. 
One solution to this problem is to take the positioning of the 
point into account while computing its Shapley value. There 
is no explicit noise detection. A point is marked as noise if 
it is the only point in its cluster. For instance, in Figure [5] 
the two points in the upper right corner are noise points, but 
owing to their low Shapley values, j3 is very low and so they 
are classified as a separate cluster (A) instead. The amortized 
time complexity of Algorithm Q] is 0(n 2 ). 

V. Coincidence of Nucleolus, Shapley value, 

GATELY POINT AND T-VALUE IN THE CURRENT SETTING 

In the game as defined in Section [TTTJ we show in this 
section, that Nucleolus, Shapley value, Gately point and r- 
value coincide. First, we discuss the structure of the core. The 
core is symmetric about a single point, which is the prime 
reason why the above solution concepts coincide with that 
very point. 



AB, CD, EF correspond to coalitional rationality constraints. 
The reader is referred to J4) for a detailed discussion on 
imputation triangle of a 3-player cooperative game. By simple 
geometry and theory on imputation triangle, it can be seen 
that AB = BE = v({2,3})\/2. Similarly, all opposite sides 
of the core are equal and so the core is symmetric about its 
center P. 

Clearly, any point other than P will have more distance 
from at least one side and so will be lexicographically greater 
than P, which means that P is the Nucleolus. Also, as the 
core is symmetric, it is intuitive that P is the fairest of all 
allocations, which means that it corresponds to the Shapley 
value imputation. We prove a general result for n-player 
clustering game that all the relevant solution concepts coincide. 

Proposition 1. For the transferable utility (TU) game defined 
by Equation [5] for each i € N, the Shapley Value is given by 



\ E m 



(9) 



SCN 
\S\=2 



Proof From Equations [5] and [7] 
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Fig. 6. The game has a symmetric core. This figure shows the core for a 
3-player game. 

Figure [6] shows the core for a 3-player cooperative game, 
in our case, a 3-point clustering game. The STR plane corre- 
sponds to collective rationality constraint, sides AF, BC, DE 
correspond to individual rationality constraints while sides 



Lemma 1. 421/ For the TU game satisfying Equation [9] for 

each S C TV, 

KS) - x> = - E & 

iCS icN\S 

The reader is referred to [ 8 1 for the proof of Lemma Q] 
Theorem 1. $8$ For the TU game satisfying Equation [9] 

<j>{v) = Nu{v) 
where Nu(y) is the Nucleolus of the TU game [N, v). 
The reader is referred to 02 for the proof of Theorem Q] 
Theorem 2. For the TU game defined by Equation [5] 

<t>{v) = Gv{v) 

where Gv(y) is the Gately point of the TU game (N,v). 
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di{<t>) 



Proof By Lemma Q] when S = {i}, we have 
v(i) -<f>i = v(N -») ~E 4>j 

From Equation Q] the propensity to disrupt for player i when 
imputation is the Shapley value is 

As the propensity to disrupt is 1 for every player i, it is equal 
for all the players and hence, from the theory in Section ITl-DI 
the Shapley value imputation is the Gately point. 

<j){v) = Gv{u) 



Theorem 3. For the TU game defined by Equation [5] 

<j>(v) = t{v) 

where t(v) is the T-value of the TU game (N,v). 
Proof From Equations [2] and [8] 

Mi(u) = u(N)-v(N-i) 

= E V W E v W 



SCN 
\S\=2 



SCN\{i} 
SI =2 



E 



SCN 
151=2 



This, with Equation [4] and the fact that for our (N, v) game, 
for all i, rrii{v) = v(i) = 0, 

u(N) = X^Mi(v) 



iCN 



= aEE^) 

i£N SCN 
iCS 
|S|=2 

= 2A »( S ) 

SCN 
\S\=2 

Using Equation [8] we get A = \. This, with Equation [3] and 
the fact that for all i, m,-(f) = 0, 



n{») = \ E <S) 



SCN 
iCS 
|S|=2 



This, with Proposition Q] gives 

<t>{y) = t(v) 



From Theorem[T] Theorem|2]and Theorem[3] the Nucleolus, 
the Shapley value, the Gately point and the r-value coincide 
in the clustering game with the chosen characteristic function. 
These results further vindicate our choice of characteristic 
function for the clustering game. 



VI. Conclusion and Future Work 

We have explored game theoretic solution concepts as an 
alternative to the existing methods, for the clustering prob- 
lem. Also, Nucleolus being both min-max fair and stable, 
is the most suitable solution concept for pattern clustering. 
We have also proved the coincidence of Nucleolus, Shapley 
value, Gately point and r-value for the given characteristic 
function. We have proposed an algorithm, Density-Restricted 
Agglomerative Clustering (DRAC), and have provided a qual- 
itative comparison with the existing algorithms along with its 
strengths and limitations. 

As a future work, it would be interesting to test our method 
using Evolutionary game theory and Bargaining concepts. It 
would be worthwhile developing a characterization of games 
for which various game theoretic solution concepts coincide. 
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